{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problems"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 9-1 Largest $i$ numbers in sorted order\n",
    "\n",
    "> Given a set of $n$ numbers, we wish to find the $i$ largest in sorted order using a comparison-based algorithm. Find the algorithm that implements each of the following methods with the best asymptotic worst-case running time, and analyze the running times of the algorithms in terms of $n$ and $i$ ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*a*__. Sort the numbers, and list the $i$ largest."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Depends on the sorting algorithm, with heap sort the worst-case is $O(n\\lg n + i)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*b*__. Build a max-priority queue from the numbers, and call EXTRACT-MAX $i$ times."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Build the heap is $O(n)$, extract is $O(\\lg n)$, thus the worst time is $O(n + i\\lg n)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*c*__. Use an order-statistic algorithm to find the $i$th largest number, partition around that number, and sort the $i$ largest numbers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$O(n + n + i\\lg i) = O(n + i\\lg i)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 9-2 Weighted median\n",
    "\n",
    "> For $n$ distinct elements $x_1, x_2, \\dots, x_n$ with positive weights $w_1, w_2, \\dots, w_n$ such that $\\sum_{i=1}^n w_i = 1$, the __*weighted (lower) median*__ is the element $x_k$ satisfying\n",
    "\n",
    "> $$ \\sum_{x_i < x_k} w_i < \\frac{1}{2}$$\n",
    "\n",
    "> and\n",
    "\n",
    "> $$ \\sum_{x_i > x_k} w_i \\le \\frac{1}{2}$$\n",
    "\n",
    "> __*a*__. Argue that the median of $x_1, x_2, \\dots, x_n$ is the weighted median of the $x_i$ with weights $w_i = 1/n$ for $i=1,2,\\dots,n$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\begin{array}{lll}\n",
    "\\displaystyle \\sum_{x_i < x_k} \\frac{1}{n} < \\frac{1}{2} & &\n",
    "\\displaystyle \\sum_{x_i > x_k} \\frac{1}{n} \\le \\frac{1}{2} \\\\\n",
    "\\displaystyle \\frac{m-1}{n} < \\frac{1}{2} & &\n",
    "\\displaystyle \\frac{n - m}{n} \\ge \\frac{1}{2} \\\\\n",
    "\\displaystyle m < \\frac{n}{2} + 1 & &\n",
    "\\displaystyle m \\ge \\frac{n}{2} \\\\\n",
    "\\end{array}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\therefore \\frac{n}{2} \\le m < \\frac{n}{2} + 1\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\therefore  m = \\left \\lfloor \\frac{n}{2} \\right \\rfloor\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*b*__. Show how to compute the weighted median of $n$ elements in $O(n \\lg n)$ worstcase time using sorting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "def weighted_median(x):\n",
    "    x.sort()\n",
    "    s = 0.0\n",
    "    for i in range(len(x)):\n",
    "        s += x[i]\n",
    "        if s >= 0.5:\n",
    "            return x[i]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*c*__. Show how to compute the weighted median in $\\Theta(n)$ worst-case time using a linear-time median algorithm such as SELECT from Section 9.3."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the median as pivot, the algorithm is exactly $T(n)=T(n/2)+\\Theta(n) = \\Theta(n)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def black_box_median(a, p, r):\n",
    "    return sorted(a[p:r])[(r - p - 1) // 2]\n",
    "\n",
    "\n",
    "def partition(a, p, r, x):\n",
    "    s = x\n",
    "    i = p - 1\n",
    "    for j in xrange(p, r - 1):\n",
    "        if a[j] == x:\n",
    "            a[j], a[r - 1] = a[r - 1], a[j]\n",
    "            break\n",
    "    for j in xrange(p, r - 1):\n",
    "        if a[j] < x:\n",
    "            i += 1\n",
    "            s += a[j]\n",
    "            a[i], a[j] = a[j], a[i]\n",
    "    i += 1\n",
    "    a[i], a[r - 1] = a[r - 1], a[i]\n",
    "    return i, s\n",
    "\n",
    "\n",
    "def weighted_median(a, p, r, w=0.5):\n",
    "    if p + 1 >= r:\n",
    "        return a[p]\n",
    "    x = black_box_median(a, p, r)\n",
    "    q, s = partition(a, p, r, x)\n",
    "    if s - a[q] < w and s >= w:\n",
    "        return a[q]\n",
    "    if s >= w:\n",
    "        return weighted_median(a, p, q, w)\n",
    "    return weighted_median(a, q + 1, r, w - s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> The __*post-office location problem*__ is defined as follows. We are given $n$ points $p_1, p_2, \\dots, p_n$ with associated wegihts $w_1, w_2, \\dots, w_n$. We wish to find a point $p$ that minimizes the sum $\\sum_{i=1}^n w_i d(p, p_i)$, where $d(a,b)$ is the distance between points $a$ and $b$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*d*__. Argue that the weighted median is a best solution for the 1-dimensional postoffice location problem, in which points are simply real numbers and the distance between points $a$ and $b$ is $d(a, b) = |a - b|$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Same as Exercise 9.3-9."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*e*__. Find the best solution for the 2-dimensional post-office location problem, in which the points are $(x,y)$ coordinate pairs and the distance between points $a=(x_1, y-1)$ and $b=(x_2, y_2)$ is the __*Manhattan distance*__ given by $d(a,b)=|x_1-x_2|+|y_1-y_2|$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since $x$ and $y$ are independent, the best solution is the medians of $x$ and $y$ separately."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 9-3 Small order statistics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> We showed that the worst-case number $T(n)$ of comparisons used by SELECT to select the $i$th order statistic from $n$ numbers satisfies $T(n)=\\Theta(n)$, but the constant hidden by the $\\Theta$-notation is rather large. When $i$ is small relative to $n$, we can implement a different procedure that uses SELECT as a subroutine but makes fewer comparisons in the worst case."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*a*__. Describe an algorithm that uses $U_i(n)$ comparisons to find the $i$th smallest of $n$ elements, where\n",
    "\n",
    "> $$ U_i(n) = \\left \\{ \\begin{array}{ll}\n",
    "T(n) & \\text{if}~~i \\ge n/2 \\\\\n",
    "\\lfloor n / 2 \\rfloor + U_i(\\lceil n / 2 \\rceil) + T(2i) & \\text{otherwise}\n",
    "\\end{array} \\right .$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Divide elements into pairs and compare each pair. Recursively deal with the set with the smaller elements in each pair, and return the $i$ smallest elements by partition, then the $i$th element belong to the pairs that appeared in the $i$ smallest elements."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*b*__. Show that, if $i < n/2$, then $U_i(n)=n+O(T(2i)\\lg(n/i))$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose $U_i(n) \\le n + cT(2i) \\lg(n/i)$,\n",
    "\n",
    "$$\n",
    "\\begin{array}{rlll}\n",
    "T(n) &=& \\lfloor n / 2 \\rfloor + U_i(\\lceil n / 2 \\rceil) + T(2i) \\\\\n",
    "&\\le& n/2 + n/2 + cT(2i) \\lg(n/2i) + T(2i) \\\\\n",
    "&=& n + cT(2i) \\lg(n/i) + T(2i)(1-c) \\\\\n",
    "&\\le& n + cT(2i) \\lg(n/i) & (c \\ge 1) \\\\\n",
    "\\end{array}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*c*__. Show that if $i$ is a constant less than $n/2$, then $U_i(n)= n + O(\\lg n)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\begin{array}{rll}\n",
    "U_i(n) &=& n+O(T(2i)\\lg(n/i)) \\\\\n",
    "&=& n + O(O(1)(\\lg n - lg i)) \\\\\n",
    "&=& n + O(\\lg n - O(1)) \\\\\n",
    "&=& n + O(\\lg n) \\\\\n",
    "\\end{array}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*d*__. Show that if $i=n/k$ for $k \\ge 2$, then $U_i(n)=n+O(T(2n/k)\\lg k)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\begin{array}{rll}\n",
    "U_i(n) &=& n+O(T(2i)\\lg(n/i)) \\\\\n",
    "&=& n+O(T(2n/k)\\lg(k/2)) \\\\\n",
    "&=& n+O(T(2n/k)\\lg k) \\\\\n",
    "\\end{array}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 9-4 Alternative analysis of randomized selection\n",
    "\n",
    "> In this problem, we use indicator random variables to analyze the RANDOMIZED-SELECT procedure in a manner akin to our analysis of RANDOMIZED-QUICKSORT in Section 7.4.2.\n",
    "\n",
    "> As in the quicksort analysis, we assume that all elements are distinct, and we rename the elements of the input array $A$ as $z_1, z_2, \\dots, z_n$, where $z_i$ is the $i$th smallest element. Thus, the call RANDOMIZED-SELECT$(A, 1, n, k)$ returns $z_k$.\n",
    "\n",
    "> For $1 \\le i < j \\le n$, let \n",
    "\n",
    "> $X_{ijk} = I \\{z_i$ is compared with $z_j$ sometime during the execution of the algorithm to find $z_k\\}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*a*__. Give an exact expression for $\\text{E}[X_{ijk}]$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\text{E}[X_{ijk}] = \\left \\{ \n",
    "\\begin{array}{ll}\n",
    "\\displaystyle \\frac{2}{j - k + 1} & (k \\le i < j) \\\\\n",
    "\\displaystyle \\frac{2}{j - i + 1} & (i \\le k \\le j) \\\\\n",
    "\\displaystyle \\frac{2}{k - i + 1} & (i < j \\le k) \\\\\n",
    "\\end{array}\n",
    "\\right .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*b*__. Let $X_k$ denote the total number of comparisons between elements of array $A$ when finding $z_k$. Show that\n",
    "\n",
    "> $$\n",
    "\\text{E}[X_k] \\le 2 \\left ( \n",
    "\\sum_{i=1}^{k}\\sum_{j=k}^n \\frac{1}{j-i+1} +\n",
    "\\sum_{j=k+1}^{n} \\frac{j-k-1}{j-k+1} +\n",
    "\\sum_{i=1}^{k-2} \\frac{k-i-1}{k-i+1}\n",
    "\\right )\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\begin{array}{rll}\n",
    "\\text{E}[X_k] &=& \\displaystyle \\sum_{i=1}^{n-1} \\sum_{j=i+1}^n \\text{E}[X_{ijk}] \\\\\n",
    "&=& \\displaystyle \\sum_{i=k+1}^{n-1} \\sum_{j=i+1}^n \\text{E}[X_{ijk}] +\n",
    "\\sum_{i=1}^{k} \\sum_{j=k}^n \\text{E}[X_{ijk}] +\n",
    "\\sum_{i=1}^{k-2} \\sum_{j=i+1}^{k-1} \\text{E}[X_{ijk}] \\\\\n",
    "&=& \\displaystyle \\sum_{i=k+1}^{n-1} \\sum_{j=i+1}^n \\frac{2}{j - k + 1} +\n",
    "\\sum_{i=1}^{k} \\sum_{j=k}^n \\frac{2}{j - i + 1} +\n",
    "\\sum_{i=1}^{k-2} \\sum_{j=i+1}^{k-1} \\frac{2}{k - i + 1} \\\\\n",
    "&=& 2 \\cdot \\left (\n",
    "\\displaystyle \\sum_{i=k+1}^{n-1} \\sum_{j=i+1}^n \\frac{1}{j - k + 1} +\n",
    "\\sum_{i=1}^{k} \\sum_{j=k}^n \\frac{1}{j - i + 1} +\n",
    "\\sum_{i=1}^{k-2} \\sum_{j=i+1}^{k-1} \\frac{1}{k - i + 1} \n",
    "\\right ) \\\\\n",
    "&=& 2 \\left (\n",
    "\\displaystyle \\sum_{j=k+2}^{n-1} \\sum_{i=k+1}^{j-1} \\frac{1}{j - k + 1} +\n",
    "\\sum_{i=1}^{k} \\sum_{j=k}^n \\frac{1}{j - i + 1} +\n",
    "\\sum_{i=1}^{k-2} \\sum_{j=i+1}^{k-1} \\frac{1}{k - i + 1}\n",
    "\\right ) \\\\\n",
    "&=& 2 \\left (\n",
    "\\displaystyle \\sum_{j=k+2}^{n-1} \\frac{j-k-1}{j - k + 1} +\n",
    "\\sum_{i=1}^{k} \\sum_{j=k}^n \\frac{1}{j - i + 1} +\n",
    "\\sum_{i=1}^{k-2} \\frac{k-i-1}{k - i + 1}\n",
    "\\right ) \\\\\n",
    "&\\le& 2 \\left (\n",
    "\\displaystyle \\sum_{j=k+1}^{n} \\frac{j-k-1}{j - k + 1} +\n",
    "\\sum_{i=1}^{k} \\sum_{j=k}^n \\frac{1}{j - i + 1} +\n",
    "\\sum_{i=1}^{k-2} \\frac{k-i-1}{k - i + 1}\n",
    "\\right ) \\\\\n",
    "&=& 2 \\left ( \n",
    "\\displaystyle \n",
    "\\sum_{i=1}^{k}\\sum_{j=k}^n \\frac{1}{j-i+1} +\n",
    "\\sum_{j=k+1}^{n} \\frac{j-k-1}{j-k+1} +\n",
    "\\sum_{i=1}^{k-2} \\frac{k-i-1}{k-i+1}\n",
    "\\right )\n",
    "\\end{array}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*c*__. Show that $E[X_k] \\le 4n$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on [StackExchange](http://math.stackexchange.com/questions/529208/inequality-sumk-i-1-sumn-j-k1-overj-i-1-le-n), \n",
    "\n",
    "$$\n",
    "\\begin{array}{rll}\n",
    "\\displaystyle \\sum_{i=1}^{k}\\sum_{j=k}^n \\frac{1}{j-i+1} &\\le& n \n",
    "\\end{array}\n",
    "$$\n",
    "\n",
    "And\n",
    "\n",
    "$$\n",
    "\\displaystyle\n",
    "\\sum_{j=k+1}^{n} \\frac{j-k-1}{j-k+1} + \\sum_{i=1}^{k-2} \\frac{k-i-1}{k-i+1} \\le \\sum_{j=k+1}^{n} 1 + \\sum_{i=1}^{k-2} 1 = n - k + k - 2 = n - 2 < n\n",
    "$$\n",
    "\n",
    "Therefore $E[X_k] \\le 4n$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> __*d*__. Conclude that, assuming all elements of array $A$ are distinct, RANDOMIZED-SELECT runs in expected time $O(n)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$O(4n) = O(n)$."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
